super arm
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Transportation > Air (0.67)
- Consumer Products & Services > Travel (0.46)
- Transportation > Air (0.68)
- Consumer Products & Services > Travel (0.46)
Combinatorial Multi-Armed Bandit with General Reward Functions
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu
In this paper, unless otherwise specified, we use MAB to refer to stochastic MAB. MAB problem demonstrates the key tradeoff between exploration and exploitation: whether the player should stick to the choice that performs the best so far, or should try some less explored alternatives that may provide better rewards.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference
Wang, Zichen, Hong, Haoyang, Li, Chuanhao, Li, Haoxuan, Zhang, Zhiheng, Wang, Huazheng
In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, $\texttt{EXP3-N-CS}$, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Oregon (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Tuscany (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- (2 more...)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Transportation > Air (0.67)
- Consumer Products & Services > Travel (0.46)
- Transportation > Air (0.68)
- Consumer Products & Services > Travel (0.46)
Representative Action Selection for Large Action-Space Meta-Bandits
Zhou, Quan, Kozdoba, Mark, Mannor, Shie
We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit
Combinatorial Multi-Armed Bandit with fairness constraints is a framework where multiple arms form a super arm and can be pulled in each round under uncertainty to maximize cumulative rewards while ensuring the minimum average reward required by each arm. The existing pessimistic-optimistic algorithm linearly combines virtual queue-lengths (tracking the fairness violations) and Upper Confidence Bound estimates as a weight for each arm and selects a super arm with the maximum total weight. The number of super arms could be exponential to the number of arms in many scenarios. In wireless networks, interference constraints can cause the number of super arms to grow exponentially with the number of arms. Evaluating all the feasible super arms to find the one with the maximum total weight can incur extremely high computational complexity in the pessimistic-optimistic algorithm. To avoid this, we develop a low-complexity fair learning algorithm based on the so-called pick-and-compare approach that involves randomly picking $M$ feasible super arms to evaluate. By setting $M$ to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity. Our theoretical proof shows this low-complexity design incurs only a slight sacrifice in fairness and regret performance. Finally, we validate the theoretical result by extensive simulations.
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
Online Experimental Design With Estimation-Regret Trade-off Under Network Interference
Network interference has garnered significant interest in the field of causal inference. It reflects diverse sociological behaviors, wherein the treatment assigned to one individual within a network may influence the outcome of other individuals, such as their neighbors. To estimate the causal effect, one classical way is to randomly assign experimental candidates into different groups and compare their differences. However, in the context of sequential experiments, such treatment assignment may result in a large regret. In this paper, we develop a unified interference-based online experimental design framework. Compared to existing literature, we expand the definition of arm space by leveraging the statistical concept of exposure mapping. Importantly, we establish the Pareto-optimal trade-off between the estimation accuracy and regret with respect to both time period and arm space, which remains superior to the baseline even in the absence of network interference. We further propose an algorithmic implementation and model generalization.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Tuscany (0.04)